AITopics | stochastic decision

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2604.25269

Genre: Research Report (0.69)

Industry:

Transportation > Infrastructure & Services (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.41)
Information Technology > Data Science > Data Mining > Big Data (0.34)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neural Information Processing SystemsDec-27-2025, 15:05:28 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

name change, online combinatorial optimization, stochastic decision, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 17:22:17 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. In this problem a subset S_t of available actions of a decision set S\subseteq {0,1}^d is chosen stochastically. Then the algorithm chooses an action v\in S_t and incurs a loss of v^T*l_t where l_t is a loss vector chosen by an oblivious adversary. The paper studies the problem in three settings full information, semi-bandit and a new setting which they term restricted. Results: Previously results were known in the full information and semi-bandit setting with sublinear regret bounds.

algorithm, loss vector, vector, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.92)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neural Information Processing SystemsSep-30-2025, 08:20:56 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

name change, online combinatorial optimization, stochastic decision, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neural Information Processing SystemsFeb-9-2025, 19:42:27 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Poland (0.04)
Europe > France (0.04)

Industry:

Transportation > Infrastructure & Services (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.51)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neural Information Processing SystemsMar-13-2024, 12:43:20 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

algorithm, learner, loss estimate, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Poland (0.04)
Europe > France (0.04)

Industry:

Transportation > Infrastructure & Services (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.51)

Add feedback

Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs

Zhou, Zihan, Wei, Honghao, Ying, Lei

arXiv.org Artificial IntelligenceFeb-5-2024

This paper considers the best policy identification (BPI) problem in online Constrained Markov Decision Processes (CMDPs). We are interested in algorithms that are model-free, have low regret, and identify an approximately optimal policy with a high probability. Existing model-free algorithms for online CMDPs with sublinear regret and constraint violation do not provide any convergence guarantee to an optimal policy and provide only average performance guarantees when a policy is uniformly sampled at random from all previously used policies. In this paper, we develop a new algorithm, named Pruning-Refinement-Identification (PRI), based on a fundamental structural property of CMDPs proved before, which we call limited stochasticity. The property says for a CMDP with $N$ constraints, there exists an optimal policy with at most $N$ stochastic decisions. The proposed algorithm first identifies at which step and in which state a stochastic decision has to be taken and then fine-tunes the distributions of these stochastic decisions. PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs an approximately optimal policy with a high probability at the end of learning; and (iii) PRI guarantees $\tilde{\mathcal{O}}(H\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}}(H^4 \sqrt{SA}K^{\frac{4}{5}})$ under a model-free algorithm, where $H$ is the length of each episode, $S$ is the number of states, $A$ is the number of actions, and the total number of episodes during learning is $2K+\tilde{\cal O}(K^{0.25}).$

algorithm, constraint violation, optimal policy, (15 more...)

arXiv.org Artificial Intelligence

2309.15395

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Washington (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neu, Gergely, Valko, Michal

Neural Information Processing SystemsFeb-14-2020, 10:43:35 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times.

algorithm, online combinatorial optimization, stochastic decision, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)

Add feedback

Discovering novel phenotypes with automatically inferred dynamic models: a partial melanocyte conversion in Xenopus

#artificialintelligenceJan-31-2017, 01:25:19 GMT

One of the key areas in which artificial intelligence and the information sciences can contribute to biology is by helping human scientists understand cellular behavior in the context of a complex organism1,2. The utility of these methods is their ability to find novel regulatory interactions10 and even novel necessary regulatory genes11. These methods are indeed becoming indispensable for understanding the complex coordination of signals necessary to develop and maintain correct body shapes and organs. Moreover, such methods are required in order to develop interventions to make rational changes to complex anatomy and physiology, in the context of regenerative medicine and systems-level diseases such as cancer12. The coordination of cellular behavior towards the anatomical needs of the host organism, and away from tumorigenesis, is achieved in part via bioelectrical communication among many cell types13,14,15,16,17,18,19. Recent work showed that depolarization of resting potential in a special cell population in Xenopus embryos, so-called instructor cells, results in a metastatic-like conversion of normal melanocytes20.

artificial intelligence, conversion, melanocyte, (14 more...)

#artificialintelligence

Industry: Health & Medicine (0.98)

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neu, Gergely, Valko, Michal

Neural Information Processing SystemsDec-31-2014

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

algorithm, learner, loss estimate, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Poland (0.04)
Europe > France (0.04)

Industry:

Transportation > Infrastructure & Services (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.51)

Add feedback

Filters

Collaborating Authors

stochastic decision

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Online combinatorial optimization with stochastic decision sets and adversarial losses

Online combinatorial optimization with stochastic decision sets and adversarial losses

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Online combinatorial optimization with stochastic decision sets and adversarial losses

Online combinatorial optimization with stochastic decision sets and adversarial losses

Online combinatorial optimization with stochastic decision sets and adversarial losses

Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs

Online combinatorial optimization with stochastic decision sets and adversarial losses

Discovering novel phenotypes with automatically inferred dynamic models: a partial melanocyte conversion in Xenopus

Online combinatorial optimization with stochastic decision sets and adversarial losses